IV LLM refactoring by akhkim · Pull Request #21 · causalNLP/causal-agent

akhkim · 2026-03-28T18:58:46Z

No description provided.

Copilot

Pull request overview

This PR refactors and extends the instrumental-variables (IV) pipeline by integrating an IV-LLM-based instrument discovery flow into CAIS, adding the supporting IV-LLM package code, and updating related tooling/tests.

Changes:

Added an iv_discovery_tool (and Pydantic I/O models) plus an IVDiscovery component that runs an IV-LLM pipeline (hypothesizer + critics).
Introduced an iv_llm subpackage (prompts, agents, critics, utilities) and logging setup for IV-LLM runs.
Updated agent/tool wiring and dataset cleaning execution behavior; added/updated tests.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
tests/cais/test_e2e_iv_new.py	New E2E IV test covering the updated IV pipeline (real LLM calls).
tests/cais/methods/test_diff_in_diff.py	New unit test intended to validate DiD estimator output structure.
cais/utils/llm_helpers.py	Refactors LLM base model import and adds `invoke_llm` helper.
cais/utils/agent.py	Adds a LangChain ReAct agent executor + custom output parser (LCEL-style agent).
cais/tools/iv_discovery_tool.py	New tool that discovers instruments via IV-LLM and updates `Variables.instrument_variable`.
cais/tools/controls_selector_tool.py	Enables LangChain `@tool` decorator and aligns imports.
cais/models.py	Adds `IVDiscoveryInput` / `IVDiscoveryOutput` models.
cais/iv_llm/src/variable_utils.py	Adds helpers for extracting/mapping LLM-proposed variable names to dataset columns.
cais/iv_llm/src/prompts/prompt_loader.py	Adds prompt loading/formatting helper for IV-LLM prompts.
cais/iv_llm/src/prompts/independence_critic.txt	Adds IV independence-critic prompt template.
cais/iv_llm/src/prompts/hypothesizer.txt	Adds IV hypothesizer prompt template with “only from columns list” constraint.
cais/iv_llm/src/prompts/exclusion_critic.txt	Adds IV exclusion-restriction prompt template.
cais/iv_llm/src/prompts/confounder_miner.txt	Adds confounder miner prompt template with “only from columns list” constraint.
cais/iv_llm/src/prompts/init.py	Marks prompts as a package.
cais/iv_llm/src/llm/client.py	Adds a thin LLM client adapter for IV-LLM code.
cais/iv_llm/src/llm/init.py	Marks llm as a package.
cais/iv_llm/src/experiments/iv_co_scientist.py	Adds an experimental IV co-scientist runner.
cais/iv_llm/src/critics/independence_critic.py	Adds critic implementation using IV-LLM prompts.
cais/iv_llm/src/critics/exclusion_critic.py	Adds critic implementation using IV-LLM prompts.
cais/iv_llm/src/critics/init.py	Marks critics as a package.
cais/iv_llm/src/agents/hypothesizer.py	Adds IV hypothesizer agent with column-grounding logic.
cais/iv_llm/src/agents/confounder_miner.py	Adds confounder-miner agent with column-grounding logic.
cais/iv_llm/src/agents/init.py	Marks agents as a package.
cais/iv_llm/src/init.py	Marks IV-LLM src as a package.
cais/iv_llm/init.py	Adds IV-LLM log file handler configuration on package import.
cais/components/iv_discovery.py	Adds IV discovery component orchestrating hypothesizer + critics.
cais/components/dataset_cleaner.py	Improves generated-script execution plumbing and adds fallback behavior/logging.
cais/agent.py	Wires IV discovery into `CausalAgent` and adjusts controls/cleaning behavior and logging.
cais/init.py	Exposes `iv_discovery_tool` via top-level package imports.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-28T19:05:58Z

tests/cais/test_e2e_iv_new.py

+    def test_iv_llm_pipeline_app_engagement_push(self):
+        """Run the full CAIS pipeline end-to-end (real API calls, no mocks)."""
+


This E2E test performs real LLM/API calls but doesn’t skip when credentials are missing (unlike the other E2E tests in this repo). This will make CI fail or incur unexpected costs. Add a setUpClass that loads dotenv and SkipTest when the required API key env var isn’t present.

Copilot · 2026-03-28T19:05:59Z

tests/cais/methods/test_diff_in_diff.py

+    # Mock all helper/validation functions within diff_in_diff.py
+    @patch('cais.methods.diff_in_diff.identify_time_variable')
+    @patch('cais.methods.diff_in_diff.identify_treatment_group')
+    @patch('cais.methods.diff_in_diff.determine_treatment_period')
+    @patch('cais.methods.diff_in_diff.validate_parallel_trends')
+    # Mock estimate_did_model to avoid actual regression, return mock results
+    @patch('cais.methods.diff_in_diff.estimate_did_model')
+    def test_estimate_effect_structure_and_types(self, mock_estimate_model, mock_validate_trends, 


The mocked patch paths reference cais.methods.diff_in_diff.*, which does not exist in the codebase. After fixing the import, these decorators should patch the helpers in the actual module where they are defined (e.g., cais.methods.difference_in_differences.estimator.*), otherwise the test won’t mock anything.

Copilot · 2026-03-28T19:05:59Z

cais/iv_llm/src/experiments/iv_co_scientist.py

+import yaml
+import json
+import pandas as pd
+from ..causal_analysis.correlation import analyze_data
+from ..agents.human_proxy import HumanProxy
+from ..agents.causal_oracle import CausalOracle
+from ..agents.hypothesizer import Hypothesizer
+from ..agents.confounder_miner import ConfounderMiner
+from ..critics.exclusion_critic import ExclusionCritic
+from ..critics.independence_critic import IndependenceCritic
+from ..agents.grounder import Grounder
+from ..analysis.iv_estimator import IVEstimator
+from ..llm.client import LLMClient
+import os
+
+class IVCoScientist:
+    def __init__(self, config):
+        # Accept either config dict or config path
+        if isinstance(config, str):
+            with open(config, 'r') as f:
+                self.config = yaml.safe_load(f)
+        else:


This file imports yaml (PyYAML), but PyYAML is not declared in pyproject.toml dependencies (it’s only pinned in requirements.txt). If this experiment module is meant to be usable from the installed package, add PyYAML to project.dependencies or defer the import behind a runtime guard so installs don’t break when users try to run it.

Copilot · 2026-03-28T19:05:59Z

tests/cais/methods/test_diff_in_diff.py

+# Import the function to test
+from cais.methods.diff_in_diff import estimate_effect
+


from cais.methods.diff_in_diff import estimate_effect will fail because there is no cais/methods/diff_in_diff.py module in the repo. Import the DiD estimator from cais.methods.difference_in_differences.estimator (or from cais.methods’ exported alias) and update the patch targets accordingly.

Copilot · 2026-03-28T19:05:59Z

cais/agent.py

 # Set up basic logging
 os.makedirs('./logs/', exist_ok=True)
+logging.basicConfig(
+    filename='./logs/agent_debug.log',
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
 logger = logging.getLogger(__name__)




logging.basicConfig(...) at import time reconfigures global logging for any library consumer and for the whole test suite. Prefer leaving logging configuration to the application entrypoint/CLI; here, set a module logger and emit logs without calling basicConfig (or gate it behind if __name__ == "__main__").

Suggested change

# Set up basic logging

os.makedirs('./logs/', exist_ok=True)

logging.basicConfig(

filename='./logs/agent_debug.log',

level=logging.INFO,

format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'

)

logger = logging.getLogger(__name__)

# Set up module logger

logger = logging.getLogger(__name__)

if __name__ == "__main__":

# Configure basic logging only when this module is executed as a script

os.makedirs('./logs/', exist_ok=True)

logging.basicConfig(

filename='./logs/agent_debug.log',

level=logging.INFO,

format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'

)

Copilot · 2026-03-28T19:06:00Z

cais/agent.py

+            variables=self.variables,
+            dataset_analysis=self.dataset_analysis,
+            dataset_description=self.dataset_description,
+            original_query=query


IV discovery is invoked via iv_discovery_tool.func(...) without passing the agent’s configured LLM/provider/model. The current IV discovery component creates its own default LLM client, which can lead to inconsistent behavior (and failures) when the agent is configured for a non-default provider/model. Consider passing self.llm through the tool/component so all steps share the same client/config.

Suggested change

original_query=query

original_query=query,

llm=self.llm,

Copilot · 2026-03-28T19:06:01Z

cais/utils/agent.py

+Final Answer: [your response here]
+```
+
+DO NOT UNDER ANY CIRCUMSTANCE CALL MORE THAN ONE TOOL IN A  STEP


The system prompt forbids multiple tool calls in a step, but the output parser’s format instructions explicitly allow repeating Action/Action Input blocks (multi-tool calls). This contradiction will confuse the model and increase parsing failures—either enforce single-action parsing or update the prompt/instructions to match the intended behavior.

Suggested change

DO NOT UNDER ANY CIRCUMSTANCE CALL MORE THAN ONE TOOL IN A STEP

In each step (each Thought/Action/Action Input/Observation block), you MUST call at most one tool. You MAY repeat this step format multiple times (with new Thoughts, Actions, Action Inputs, and Observations) before giving the Final Answer.

Copilot · 2026-03-28T19:06:01Z

cais/tools/iv_discovery_tool.py

+    # Get confounders from variables if available
+    confounders = variables.covariates or []


This tool uses variables.covariates as the confounder list for IV validation. In this codebase, confounders is a distinct field on Variables (used by propensity-score methods), so this likely drops the actual confounders identified by query_interpreter. Use variables.confounders (or a union of confounders and covariates if that’s intended) to avoid validating independence against the wrong set.

Suggested change

# Get confounders from variables if available

confounders = variables.covariates or []

# Get confounders from variables if available.

# Prefer the dedicated `confounders` field, but also include any covariates.

base_confounders = variables.confounders or []

additional_covariates = variables.covariates or []

# Build a unified list without duplicates, preserving order.

confounders = list(dict.fromkeys(base_confounders + additional_covariates))

Copilot · 2026-03-28T19:06:01Z

cais/utils/agent.py

+            for m in matches:
+                tool_name = m.group(1).strip()
+                tool_input = m.group(2).strip().strip('"')
+                print('\n--------------------------')
+                print(tool_input)
+                print('--------------------------')
+                actions.append(AgentAction(tool_name, json.loads(tool_input), text))
+
+            return actions


json.loads(tool_input) is called without error handling. If the model emits invalid JSON (common in practice), this will raise JSONDecodeError instead of OutputParserException, bypassing handle_parsing_errors and potentially crashing the agent. Catch JSON decode errors and raise OutputParserException with a helpful message/observation.

Copilot · 2026-03-28T19:06:01Z

cais/utils/agent.py

+        memory=memory, # Pass the memory object
+        verbose=True,
+        callbacks=[ConsoleCallbackHandler()], # Optional: for console debugging
+        handle_parsing_errors=True, # Let AE handle parsing errors
+        max_retries = 100


AgentExecutor(...) is constructed with max_retries=100, but in LangChain 0.3.x AgentExecutor doesn’t accept a max_retries kwarg (it uses max_iterations/max_execution_time depending on version). This will raise TypeError at runtime. Use the supported limit parameters or implement retry logic outside the executor.

Suggested change

memory=memory, # Pass the memory object

verbose=True,

callbacks=[ConsoleCallbackHandler()], # Optional: for console debugging

handle_parsing_errors=True, # Let AE handle parsing errors

max_retries = 100

memory=memory, # Pass the memory object

verbose=True,

callbacks=[ConsoleCallbackHandler()], # Optional: for console debugging

handle_parsing_errors=True, # Let AE handle parsing errors

max_iterations=100

IV LLM refactoring

e009ff1

Copilot AI review requested due to automatic review settings March 28, 2026 18:58

Copilot started reviewing on behalf of akhkim March 28, 2026 18:59 View session

Dataset cleaner change

d910f0f

Copilot AI reviewed Mar 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IV LLM refactoring#21

IV LLM refactoring#21
akhkim wants to merge 2 commits intorefactoringfrom
iv_refactored

akhkim commented Mar 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		def test_iv_llm_pipeline_app_engagement_push(self):
		"""Run the full CAIS pipeline end-to-end (real API calls, no mocks)."""

		# Import the function to test
		from cais.methods.diff_in_diff import estimate_effect

	DO NOT UNDER ANY CIRCUMSTANCE CALL MORE THAN ONE TOOL IN A STEP
	In each step (each Thought/Action/Action Input/Observation block), you MUST call at most one tool. You MAY repeat this step format multiple times (with new Thoughts, Actions, Action Inputs, and Observations) before giving the Final Answer.

		# Get confounders from variables if available
		confounders = variables.covariates or []

-    # Get confounders from variables if available
-    confounders = variables.covariates or []
+    # Get confounders from variables if available.
+    # Prefer the dedicated `confounders` field, but also include any covariates.
+    base_confounders = variables.confounders or []
+    additional_covariates = variables.covariates or []
+    # Build a unified list without duplicates, preserving order.
+    confounders = list(dict.fromkeys(base_confounders + additional_covariates))

Conversation

akhkim commented Mar 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants